An Optimized Floating-Point Matrix Multiplication on FPGA

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FPGA accelerator for floating-point matrix multiplication

This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering...

متن کامل

Energy Performance of Floating-Point Matrix Multiplication on FPGAs

Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGA-based floating-point matrix multiplication considers the optimization of latency or area only. In this paper, we analyze the impact of various parame...

متن کامل

An Optimized Matrix Multiplication on ARMv7 Architecture

A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, we propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional des...

متن کامل

Design and Implementation of an Optimized Double Precision Floating Point Divider on FPGA

Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications.So due to this not much development had taken place in this field. But nowadays floating point divider has become indispensable and increasingly important in many modern applications. Most of the previous implementation required much larger area and latencies. In this ...

متن کامل

An Efficient LUT Design on FPGA for Memory-Based Multiplication

An efficient Lookup Table (LUT) design for memory-based multiplier is proposed.  This multiplier can be preferred in DSP computation where one of the inputs, which is filter coefficient to the multiplier, is fixed. In this design, all possible product terms of input multiplicand with the fixed coefficient are stored directly in memory. In contrast to an earlier proposition Odd Multiple Storage ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Technology Journal

سال: 2013

ISSN: 1812-5638

DOI: 10.3923/itj.2013.1832.1838